Reducing the Performance Impact of Instruction Cache Misses

نویسندگان

Jared Stark

Paul Racunas

Yale N. Patt

چکیده

Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works, must be obtained from the Abstract In conventional processors, each instruction cache fetch brings in a group of instructions. Upon encountering an instruction cache miss, the processor will wait until the instruction cache miss is serviced before continuing to fetch any new instructions. This paper presents a new technique, called out-of-order issue, which allows the processor to temporarily ignore the instructions associated with the instruction cache miss. The processor attempts to fetch the instructions that follow the group of instructions associated with the miss. These instructions are then decoded and written into the processor's reservation stations. Later, after the instruction cache miss has been serviced, the instructions associated with the miss are decoded and written into the reservation stations. (We use the term issue to indicate the act of writing instructions into the reservation stations. With this technique, instructions are not written into the reservation stations in program order. Hence, the term out-of-order issue.) In this paper, we introduce the concept of out-of-order issue, describe its implementation , and present some initial data showing the performance gains possible with out-of-order issue.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Effect of Node Size on the Performance of Cache-Conscious Indices

In main-memory environments, the number of processor cache misses has a critical impact on the performance of the system. Cache-conscious indices are designed to improve the performance of mainmemory indices by reducing the number of processor cache misses that are incurred during a search operation. Conventional wisdom suggests that the index’s node size should be equal to the cache line size ...

متن کامل

Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

As the degree of instruction-level parallelism in superscalar architectures increases, the gap between processor and memory performance continues to grow requiring more aggressive techniques to increase the performance of the memory system. We propose a new technique, which is based on the wrong-path execution of loads far beyond instruction fetch-limiting conditional branches, to exploit more ...

متن کامل

Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

As the gap between memory and processor performance continues to grow, it becomes increasingly important to exploit cache memory effectively. Both hardware and software techniques can be used to better utilize the cache. Hardware solutions focus on organization, while most software solutions investigate how to best layout a program on the available memory space. In this paper we present a new l...

متن کامل

Understanding the Backward Slices of Performance Degrading Instructions Craig

For many applications, branch mispredictions and cache misses limit a processor's performance to a level well below its peak instruction throughput. A small fraction of static instructions, whose behavior cannot be anticipated using current branch predictors and caches, contribute a large fraction of such performance degrading events. This paper analyzes the dynamic instruction stream leading u...

متن کامل

The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture

Concurrent multithreaded architectures exploit both instructionlevel and thread-level parallelism in application programs. A single-threaded sequencing mechanism needs speculative execution beyond conditional branches in order to exploit more instruction-level parallelism. In addition, an aggressive multithreaded architecture should also use thread-level control speculation in order to exploit ...

متن کامل

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

ÐCurrent microprocessors incorporate techniques to aggressively exploit instruction-level parallelism (ILP). This paper evaluates the impact of such processors on the performance of shared-memory multiprocessors, both without and with the latencyhiding optimization of software prefetching. Our results show that, while ILP techniques substantially reduce CPU time in multiprocessors, they are les...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1997

Reducing the Performance Impact of Instruction Cache Misses

نویسندگان

چکیده

منابع مشابه

Effect of Node Size on the Performance of Cache-Conscious Indices

Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions

Temporal-Based Procedure Reordering for Improved Instruction Cache Performance

Understanding the Backward Slices of Performance Degrading Instructions Craig

The Effect of Executing Mispredicted Load Instructions in a Speculative Multithreaded Architecture

The Impact of Exploiting Instruction-Level Parallelism on Shared-Memory Multiprocessors

عنوان ژورنال:

اشتراک گذاری